SAM Doc : Installing SAM Update-17.1
This page last changed on Oct 11, 2012 by tarragon.
This page describes the process to install and configure SAM-Nagios node type from scratch.
EnvironmentDisabled selinux in /etc/selinux/config SELINUX=disabled
RequirementsYou need to install host certificate in order to secure the Nagios web portal. Certificate should be placed on the standard location: ls -l /etc/grid-security/host* -rw-r--r-- 1 root root 2286 Oct 28 19:26 /etc/grid-security/hostcert.pem -r-------- 1 root root 887 Oct 28 19:25 /etc/grid-security/hostkey.pem
openssl x509 -in /etc/grid-security/hostcert.pem -noout -purpose | grep "SSL client" SSL client : Yes RepositoriesInstall YUM and rpmforge packages:
Remove the old lcg-CA repository, if installed:
Repositories ListConfigure the following repositories:
Repository PrioritiesInstall yum-priorities: yum install yum-priorities Modify repository files:
Installationyum install lcg-CA yum install httpd yum install nagios.x86_64 # make sure that nagios from EGI SAM repository is installed yum --exclude=\*saga\* --exclude=\*SAGA\* groupinstall 'glite-UI (production - x86_64)' yum install sam-nagios
ConfigurationA SAM-Nagios node type can be configured in three different way in order to monitor different sets of sites/services:
The configuration of all SAM-Nagios boxes is based on https://twiki.cern.ch/twiki/bin/view/EGEE/YAIM. NGI NagiosFor NGI Nagios instances the following variables must be set (OPS VO example): Edit YAIM configuration file: # Generic SITE_NAME=egee.srce.hr SITE_BDII_HOST=ce1-egee.srce.hr PX_HOST=se1-egee.srce.hr BDII_HOST=bdii-egee.srce.hr VOS="dteam ops" VO_OPS_VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/" VO_OPS_VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops 24' 'ops voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops 24'" VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'" VO_DTEAM_VOMS_SERVERS='vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/' VO_DTEAM_VOMSES="'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam 24' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam 24'" VO_DTEAM_VOMS_CA_DN="'/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006'" RB_HOST=skurut2.cesnet.cz # irelevant, RB is unsupported VO_DTEAM_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes VO_OPS_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes # Nagios NAGIOS_HOST=nagiosdev001.cern.ch NAGIOS_ADMIN_DNS="/C=HR/O=edu/OU=srce/CN=Emir Imamagic" NCG_NAGIOS_ADMIN=eimamagi@srce.hr NAGIOS_ROLE=ngi NCG_PROBES_TYPE=local NCG_VO=ops NAGIOS_HTTPD_ENABLE_CONFIG=true NAGIOS_SUDO_ENABLE_CONFIG=true NAGIOS_NCG_ENABLE_CONFIG=true NAGIOS_NAGIOS_ENABLE_CONFIG=true NAGIOS_CGI_ENABLE_CONFIG=true NAGIOS_NSCA_PASS=MY_PASS # NGI/ROC Nagios COUNTRY_NAME=Croatia NCG_GOCDB_ROC_NAME=NGI_HR NAGIOS_SUDO_ENABLE_CONFIG=true # DB data MYSQL_ADMIN="MY_MYSQL_PASS" DB_PASS="MY_MRS_PASS" MYEGI_ADMIN_NAME="Admin Name" MYEGI_ADMIN_EMAIL="admin@address.hr" MYEGI_DEFAULT_PROFILE="ROC" # profile to be displayed by default in MyEGI MYEGI_REGION="NGI_HR"
Run YAIM: /opt/glite/yaim/bin/yaim -s site-info.def -c -n glite-UI -n glite-NAGIOS Run myproxy-init:
myproxy-init -l nagios -s se1-egee.srce.hr -k NagiosRetrieve-nagiosdev001.cern.ch-dteam -c 336 -x -Z "/DC=ch/DC=cern/OU=computers/CN=nagiosdev001.cern.ch" Site NagiosFor Site Nagios instances the following variables must be set (using remote-only probes): Edit YAIM configuration file: # Generic SITE_NAME=egee.srce.hr SITE_BDII_HOST=ce1-egee.srce.hr PX_HOST=se1-egee.srce.hr BDII_HOST=bdii-egee.srce.hr VOS="dteam ops" VO_DTEAM_VOMS_SERVERS="'vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/'" VO_OPS_VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/" # Nagios NAGIOS_HOST=nagiosdev001.cern.ch NAGIOS_ADMIN_DNS="/C=HR/O=edu/OU=srce/CN=Emir Imamagic" NCG_NAGIOS_ADMIN=eimamagi@srce.hr NAGIOS_ROLE=site NCG_PROBES_TYPE=remote NCG_VO=dteam NAGIOS_HTTPD_ENABLE_CONFIG=true NAGIOS_SUDO_ENABLE_CONFIG=true NAGIOS_NCG_ENABLE_CONFIG=true NAGIOS_NAGIOS_ENABLE_CONFIG=true NAGIOS_CGI_ENABLE_CONFIG=true NCG_REMOTE_USE_NAGIOS=true NAGIOS_NSCA_PASS=MY_PASS Run YAIM: /opt/glite/yaim/bin/yaim -s site-info.def -c -n glite-NAGIOS
Edit YAIM configuration file: # Generic SITE_NAME=egee.srce.hr SITE_BDII_HOST=ce1-egee.srce.hr PX_HOST=se1-egee.srce.hr BDII_HOST=bdii-egee.srce.hr VOS="dteam ops" VO_OPS_VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/" VO_OPS_VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops 24' 'ops voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops 24'" VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'"VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'" VO_DTEAM_VOMS_SERVERS='vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/' VO_DTEAM_VOMSES="'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam 24' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam 24'" VO_DTEAM_VOMS_CA_DN="'/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006'" RB_HOST=skurut2.cesnet.cz VO_DTEAM_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # Nagios NAGIOS_HOST=nagiosdev001.cern.ch NAGIOS_ADMIN_DNS="/C=HR/O=edu/OU=srce/CN=Emir Imamagic" NCG_NAGIOS_ADMIN=eimamagi@srce.hr NAGIOS_ROLE=site NCG_PROBES_TYPE=remote,local NCG_VO=dteam NAGIOS_HTTPD_ENABLE_CONFIG=true NAGIOS_SUDO_ENABLE_CONFIG=true NAGIOS_NCG_ENABLE_CONFIG=true NAGIOS_NAGIOS_ENABLE_CONFIG=true NAGIOS_CGI_ENABLE_CONFIG=true NCG_REMOTE_USE_NAGIOS=true NAGIOS_NSCA_PASS=MY_PASS
Run YAIM: /opt/glite/yaim/bin/yaim -s site-info.def -c -n glite-UI -n glite-NAGIOS Run myproxy-init:
myproxy-init -l nagios -s se1-egee.srce.hr -k NagiosRetrieve-nagiosdev001.cern.ch-dteam -c 336 -x -Z "/DC=ch/DC=cern/OU=computers/CN=nagiosdev001.cern.ch" VO NagiosFor VO Nagios instances the following variables must be set (CMS VO example using VO feeds).
Edit YAIM configuration file: # Generic SITE_NAME=egee.srce.hr SITE_BDII_HOST=ce1-egee.srce.hr PX_HOST=se1-egee.srce.hr BDII_HOST=bdii-egee.srce.hr RB_HOST=skurut2.cesnet.cz # irelevant, RB is unsupported VOS="dteam ops cms" VO_OPS_VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/" VO_CMS_VOMS_SERVERS="'vomss://voms.cern.ch:8443/voms/cms?/cms/'" VO_OPS_VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops 24' 'ops voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops 24'" VO_CMS_VOMSES="'cms lcg-voms.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch cms 24' 'cms voms.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch cms 24'" VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'" VO_CMS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'" VO_DTEAM_VOMS_SERVERS='vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/' VO_DTEAM_VOMSES="'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam 24' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam 24'" VO_DTEAM_VOMS_CA_DN="'/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006'" VO_DTEAM_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes VO_OPS_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes VO_CMS_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes # Nagios NAGIOS_HOST=nagiosdev001.cern.ch NAGIOS_ADMIN_DNS="/C=HR/O=edu/OU=srce/CN=Emir Imamagic" NCG_NAGIOS_ADMIN=eimamagi@srce.hr NAGIOS_ROLE=vo NCG_PROBES_TYPE=local NCG_VO=cms NAGIOS_HTTPD_ENABLE_CONFIG=true NAGIOS_SUDO_ENABLE_CONFIG=true NAGIOS_NCG_ENABLE_CONFIG=true NAGIOS_NAGIOS_ENABLE_CONFIG=true NAGIOS_CGI_ENABLE_CONFIG=true NAGIOS_NSCA_PASS=MY_PASS # VO Nagios NCG_TOPOLOGY_USE_GOCDB=false NCG_TOPOLOGY_USE_LDAP=false NCG_REMOTE_USE_NAGIOS=false NCG_USE_ATP_VO_FEED=true ATP_ROOT_URL="https://localhost/atp" # ATP ATP_VO_FEEDS="<list of VOs>" ATP_VO_FEED_<vo1>="<vo feed url>" ATP_VO_FEED_<vo2>="<vo feed url>" # POEM POEM_WEB_ENABLE=True POEM_NAMESPACE="<namespace>" POEM_SYNC_URLS="http://localhost/poem/api/0.1/json/" POEM_SYNC_NS_RESTRICT="" # DB data MYSQL_ADMIN="MY_MYSQL_PASS" DB_PASS="MY_MRS_PASS" MYEGI_ADMIN_NAME="Admin Name" MYEGI_ADMIN_EMAIL="admin@address.hr" MYEGI_DEFAULT_PROFILE=<profile> # name of the profile you have defined via POEM without namespace MYEGI_REGION="NGI_HR"
Run YAIM: /opt/glite/yaim/bin/yaim -s site-info.def -c -n glite-UI -n glite-NAGIOS
Run myproxy-init:
myproxy-init -l nagios -s se1-egee.srce.hr -k NagiosRetrieve-nagiosdev001.cern.ch-dteam -c 336 -x -Z "/DC=ch/DC=cern/OU=computers/CN=nagiosdev001.cern.ch" Additional ConfigurationMigrating from Hash_local.pm to ncg-metric-configIn case you have custom settings in the Hash_local.pm then please note that starting from Update-17 Hash_local.pm is no longer supported. After upgrade admins need to convert Hash_local.pm configuration of metrics to new format. Conversion is done in the following way: /usr/libexec/hashlocal-to-json.pl > /etc/ncg-metric-config.d/local.conf Note: generated config will only contain metric description. In order to include metrics in the final configuration one needs to configure POEM profile with appropriate service-metric mappings. See POEM configuration.
Failover Nagios - configurable hot-standby modeStarting from Update-13 SAM supports deployment of hot-standby (active/active) instances. The systems works in the following way:
Backup instance can be defined in several ways:
In case of backup configuration without Yaim the following additional step is needed: /sbin/chkconfig send-to-dashboard off /sbin/service send-to-dashboard stop In order to turn backup instance into the active one, SAM administrator has to remove BACKUP_INSTANCE variable. If the Yaim is not used the following additional step is needed: /sbin/chkconfig send-to-dashboard on /sbin/service send-to-dashboard start Robot certificatesStarting from Update-09 SAM supports usage of robot certificates, instead of MyProxy credentials. If your CA supports robot certificates, we suggest switching to robot certificates, as they are easier to maintain. Also robots provide better availability as SAM doesn't depend on availability of MyProxy servier. In order to use robot certificates set the following YAIM variables: NCG_USE_ROBOT_CERT=true # Robot cert and key can be different for each VO # and standard Yaim VO notation is used VO_OPS_ROBOT_CERT=/etc/nagios/globus/robot-cert.pem VO_OPS_ROBOT_KEY=/etc/nagios/globus/robot-key.pem VO_DTEAM_ROBOT_CERT=/etc/nagios/globus/robot-cert.pem-dteam VO_DTEAM_ROBOT_KEY=/etc/nagios/globus/robot-key.pem-dteam
ACE support in MyEGICurrently it's only for the central MyEGI instance. YAIM configuration: MYEGI_ACE=true Setting alternative SE for metric org.sam.WN-RepRepStarting from the release Update-07, it is possible to specify more than one replication SE for WN replica test org.sam.WN-RepRep. Static and/or dynamic mechanisms are possible. In order to define static list of comma-separated hostnames set the following Yaim variable: JOBSUBMIT_WN_SE_REP=se1[,se2,se3...] Dynamic list is filled with a list of SEs defined on the Nagios instance that recently successfully passed org.sam.SRM-All set of tests. In order to use dynamic list set the following Yaim variable: JOBSUBMIT_WN_SE_REP_FILE=filename Filename must be defined without path. The org.sam.(CREAM)CE-JobState metric(s) takes up to max 3 hosts from the file and, if JOBSUBMIT_WN_SE_REP was defined, appends them to the static list. On WN, org.sam.WN-RepRep tries to replicate to all the SEs in the provided order until the replication succeeds. The metric returns CRITICAL, if file couldn't be replicated to any for the SEs. This fixes https://tomtools.cern.ch/jira/browse/SAM-442. Setting alternative BDII for metric org.sam.SRM-AllMetric org.sam.SRM-All uses sam-bdii.cern.ch top BDII by default. In order to make tests less dependent on CERN top BDII it is suggested to set alternative BDII. In order to set alternative BDII create localdb file (e.g. /etc/ncg/ncg-localdb.d/srm.conf). There are two options: MODIFY_METRIC_PARAMETER!org.sam.SRM-All!--ldap-uri!your.top.bdii 2. use site BDII: MODIFY_METRIC_ATTRIBUTE!org.sam.SRM-All!SITE_BDII!--ldap-uri Setting alternative LFC for metrics org.sam.WN-Rep*Metrics org.sam.WN-Rep* use prod-lfc-shared-central.cern.ch LFC by default. In order to set alternative lfc create localdb file (e.g. /etc/ncg/ncg-localdb.d/LFC.conf): MODIFY_METRIC_PARAMETER!org.sam.CREAMCE-JobState!--wn-lfc!lfc.my.domain MODIFY_METRIC_PARAMETER!org.sam.CE-JobState!--wn-lfc!lfc.my.domain Setting alternative BDII for metric org.sam.CREAMCE-DirectJobStateMetric org.sam.CREAMCE-DirectJobState uses sam-bdii.cern.ch top BDII by default. In order to make tests less dependent on CERN top BDII it is suggested to set alternative BDII. In order to set alternative BDII create localdb file (e.g. /etc/ncg/ncg-localdb.d/creamcedjs.conf). There are two options: MODIFY_METRIC_PARAMETER!org.sam.CREAMCE-DirectJobState!--ldap-uri!your.top.bdii 2. use site BDII: MODIFY_METRIC_ATTRIBUTE!org.sam.CREAMCE-DirectJobState!SITE_BDII!--ldap-uri Setting alternative list of CEs for metric org.sam.WMS-JobStateIf the monitored infrastructure contains WMS service and no CE services, metric hr.srce.GoodCEs associated to Nagios service will fail with the following error: HealthyNodes CRITICAL - No healthy hosts found. There are two options to solve this issue. 1. If the infrastructure contains CREAM-CE services create file /etc/ncg/ncg-localdb.d/GoodCEs-fix with the following content: MODIFY_METRIC_PARAMETER!hr.srce.GoodCEs!--metric!org.sam.CREAMCE-JobSubmit 2. In order to use static list of CEs or CREAM-CEs create file /etc/ncg/ncg-localdb.d/GoodCEs-fix with the following content: REMOVE_METRIC!hr.srce.GoodCEs For each VO supported on SAM instance create file /var/lib/gridprobes/<VO_NAME>/GoodCEs and list CE/CREAM-CE names in it. Example is: ce1.reliable.my ce2.reliable.my cream-ce3.reliable.my In case VO_FQAN is used (e.g. /ops/Role=lcgadmin) <VO_NAME> should be set to VO_FQAN with "/" replaced with "." (e.g. /var/lib/gridprobes/ops.Role=lcgadmin/GoodCEs). Monitoring Globus servicesGlobus services currently do not support VOs. In order to monitor Globus services SAM administrator has to contact all sites and request to add the certificate DN to the grid-mapfile. Removing metrics from alias only (SAM-1645)REMOVE_ALIAS_METRIC!alias!metric Enabling host and site contacts when global notifications (ENABLE_NOTIFICATIONS) are disabled:It's done in localdb: ADD_SITE_CONTACT!sitename!emailAddress ENABLE_SITE_CONTACT!sitename!emailAddress ENABLE_HOSTCONTACT!hostname!emailAddress Throttling of MyWLCG WEB APIPerformance limits in MyWLCG/MyEGI portal are set by YAIM variables. # Limit number of rows that can be fetched at a time to avoid DB dumps. MYWLCG_DB_LIMIT=50000 # Limit number of accesses per IP address in a given time(seconds). MYWLCG_ACCESS_PERIOD=5 MYWLCG_NUMBER_OF_ACCESSES=100 ValidationAfter successful running of Yaim you should be able to access Nagios web interface at the address https://NAGIOS_SERVER/nagios. If you enabled local probes make sure that you first check if MyProxy credential works by running hr.srce.GridProxy-Get-VO metric on NAGIOS_SERVER. You can do this by force scheduling check via web interface or via command line: nagios-run-check NAGIOS_SERVER hr.srce.GridProxy-Get-VO MyEGI interface is at the address: https://NAGIOS_SERVER/myegi. Check resource BDII: ldapsearch -x -LLL -h NAGIOS_SERVER -p 2170 -b Mds-Vo-Name=resource,O=grid "(GlueServiceType=*-NAGIOS)" GlueServiceEndpoint dn: GlueServiceUniqueID=NAGIOS_SERVER_XXXXXX-NAGIOS_2937827985,Mds-Vo-name= resource,o=grid GlueServiceEndpoint: https://NAGIOS_SERVER:443/nagios Known IssuesFor machines running latest version of glite-UI (3.2.10-1): service nagios restart ProblemsA description of common problems when installing SAM can be found at the Troubleshooting section. |
Document generated by Confluence on Feb 27, 2014 10:19 |